85 research outputs found

    On the Scalability of Data Reduction Techniques in Current and Upcoming HPC Systems from an Application Perspective

    Full text link
    We implement and benchmark parallel I/O methods for the fully-manycore driven particle-in-cell code PIConGPU. Identifying throughput and overall I/O size as a major challenge for applications on today's and future HPC systems, we present a scaling law characterizing performance bottlenecks in state-of-the-art approaches for data reduction. Consequently, we propose, implement and verify multi-threaded data-transformations for the I/O library ADIOS as a feasible way to trade underutilized host-side compute potential on heterogeneous systems for reduced I/O latency.Comment: 15 pages, 5 figures, accepted for DRBSD-1 in conjunction with ISC'1

    Runtime I/O Re-Routing + Throttling on HPC Storage

    Get PDF
    Abstract Massively parallel storage systems are becoming more and more prevalent on HPC systems due to the emergence of a new generation of data-intensive applications. To achieve the level of I/O throughput and capacity that is demanded by data intensive applications, storage systems typically deploy a large number of storage devices (also known as LUNs or data stores). In doing so, parallel applications are allowed to access storage concurrently, and as a result, the aggregate I/O throughput can be linearly increased with the number of storage devices, reducing the application's end-to-end time. For a production system where storage devices are shared between multiple applications, contention is often a major problem leading to a significant reduction in I/O throughput. In this paper, we describe our efforts to resolve this issue in the context of HPC using a balanced re-routing + throttling approach. The proposed scheme re-routes I/O requests to a less congested storage location in a controlled manner so that write performance is improved while limiting the impact on read

    Számítóháló alkalmazások teljesítményanalízise és optimalizációja = Performance analysis and optimisation of grid applications

    Get PDF
    Számítóhálón (griden) futó alkalmazások, elsősorban workflow-k hatékony végrehajtására kerestünk újszerű megoldásokat a grid teljesítményanalízis és optimalizáció területén. Elkészítettük a Mercury monitort a grid teljesítményanalízis követelményeit figyelembe véve. A párhuzamos programok monitorozására alkalmas GRM monitort integráltuk a relációs adatmodell alapú R-GMA grid információs rendszerrel, illetve a Mercury monitorral. Elkészült a Pulse, és a Prove vizualizációs eszköz grid teljesítményanalízist támogató verziója. Elkészítettünk egy state-of-the-art felmérést grid teljesítményanalízis eszközökről. Kidolgoztuk a P-GRADE rendszer workflow absztrakciós rétegét, melyhez kapcsolódóan elkészült a P-GRADE portál. Ennek segítségével a felhasználók egy web böngészőn keresztül szerkeszthetnek és hajthatnak végre workflow alkalmazásokat számítóhálón. A portál különböző számítóháló implementációkat támogat. Lehetőséget biztosít információ gyűjtésére teljesítményanalízis céljából. Megvizsgáltuk a portál erőforrás brókerekkel való együttműködését, felkészítettük a portált a sikertelen futások javítására. A végrehajtás optimalizálása megkövetelheti az alkalmazás egyes részeinek áthelyezését más erőforrásokra. Ennek támogatására továbbfejlesztettük a P-GRADE alkalmazások naplózhatóságát, és illesztettük a Condor feladatütemezőjéhez. Sikeresen kapcsoltunk a rendszerhez egy terhelés elosztó modult, mely képes a terheltségétől függően áthelyezni a folyamatokat. | We investigated novel approaches for performance analysis and optimization for efficient execution of grid applications, especially workflows. We took into consideration the special requirements of grid performance analysis when elaborated Mercury, a grid monitoring infrastructure. GRM, a performance monitor for parallel applications, has been integrated with R-GMA, a relational grid information system and Mercury as well. We developed Pulse and Prove visualisation tools for supporting grid performance analysis. We wrote a comprehensive state-of-the art survey of grid performance tools. We designed a novel abstraction layer of P-GRADE supporting workflows, and a grid portal. Users can draft and execute workflow applications in the grid via a web browser using the portal. The portal supports multiple grid implementations and provides monitoring capabilities for performance analysis. We tested the integration of the portal with grid resource brokers and also augmented it with some degree of fault-tolerance. Optimization may require the migration of parts of the application to different resources and thus, it requires support for checkpointing. We enhanced the checkpointing facilities of P-GRADE and coupled it to Condor job scheduler. We also extended the system with a load balancer module that is able to migrate processes as part of the optimization

    Unraveling Diffusion in Fusion Plasma: A Case Study of In Situ Processing and Particle Sorting

    Full text link
    This work starts an in situ processing capability to study a certain diffusion process in magnetic confinement fusion. This diffusion process involves plasma particles that are likely to escape confinement. Such particles carry a significant amount of energy from the burning plasma inside the tokamak to the diverter and damaging the diverter plate. This study requires in situ processing because of the fast changing nature of the particle diffusion process. However, the in situ processing approach is challenging because the amount of data to be retained for the diffusion calculations increases over time, unlike in other in situ processing cases where the amount of data to be processed is constant over time. Here we report our preliminary efforts to control the memory usage while ensuring the necessary analysis tasks are completed in a timely manner. Compared with an earlier naive attempt to directly computing the same diffusion displacements in the simulation code, this in situ version reduces the memory usage from particle information by nearly 60% and computation time by about 20%

    A Lightweight I/O Scheme to Facilitate Spatial and Temporal Queries of Scientific Data Analytics

    Get PDF
    In the era of petascale computing, more scientific applications are being deployed on leadership scale computing platforms to enhance the scientific productivity. Many I/O techniques have been designed to address the growing I/O bottleneck on large-scale systems by handling massive scientific data in a holistic manner. While such techniques have been leveraged in a wide range of applications, they have not been shown as adequate for many mission critical applications, particularly in data post-processing stage. One of the examples is that some scientific applications generate datasets composed of a vast amount of small data elements that are organized along many spatial and temporal dimensions but require sophisticated data analytics on one or more dimensions. Including such dimensional knowledge into data organization can be beneficial to the efficiency of data post-processing, which is often missing from exiting I/O techniques. In this study, we propose a novel I/O scheme named STAR (Spatial and Temporal AggRegation) to enable high performance data queries for scientific analytics. STAR is able to dive into the massive data, identify the spatial and temporal relationships among data variables, and accordingly organize them into an optimized multi-dimensional data structure before storing to the storage. This technique not only facilitates the common access patterns of data analytics, but also further reduces the application turnaround time. In particular, STAR is able to enable efficient data queries along the time dimension, a practice common in scientific analytics but not yet supported by existing I/O techniques. In our case study with a critical climate modeling application GEOS-5, the experimental results on Jaguar supercomputer demonstrate an improvement up to 73 times for the read performance compared to the original I/O method

    MGARD+: Optimizing Multilevel Methods for Error-Bounded Scientific Data Reduction

    Get PDF
    Nowadays, data reduction is becoming increasingly important in dealing with the large amounts of scientific data. Existing multilevel compression algorithms offer a promising way to manage scientific data at scale but may suffer from relatively low performance and reduction quality. In this paper, we propose MGARD+, a multilevel data reduction and refactoring framework drawing on previous multilevel methods, to achieve high-performance data decomposition and high-quality error-bounded lossy compression. Our contributions are four-fold: 1) We propose to leverage a level-wise coefficient quantization method, which uses different error tolerances to quantize the multilevel coefficients. 2) We propose an adaptive decomposition method which treats the multilevel decomposition as a preconditioner and terminates the decomposition process at an appropriate level. 3) We leverage a set of algorithmic optimization strategies to significantly improve the performance of multilevel decomposition/recompositing. 4) We evaluate our proposed method using four real-world scientific datasets and compare with several state-of-the-art lossy compressors. Experiments demonstrate that our optimizations improve the decomposition/recompositing performance of the existing multilevel method by up to 70×70 \times70x, and the proposed compression method can improve compression ratio by up to 2×2 \times2x compared with other state-of-the-art error-bounded lossy compressors under the same level of data distortion

    Parallel in situ indexing for data-intensive computing

    Full text link
    As computing power increases exponentially, vast amount of data is created by many scientific re- search activities. However, the bandwidth for storing the data to disks and reading the data from disks has been improving at a much slower pace. These two trends produce an ever-widening data access gap. Our work brings together two distinct technologies to address this data access issue: indexing and in situ processing. From decades of database research literature, we know that indexing is an effective way to address the data access issue, particularly for accessing relatively small fraction of data records. As data sets increase in sizes, more and more analysts need to use selective data access, which makes indexing an even more important for improving data access. The challenge is that most implementations of in- dexing technology are embedded in large database management systems (DBMS), but most scientific datasets are not managed by any DBMS. In this work, we choose to include indexes with the scientific data instead of requiring the data to be loaded into a DBMS. We use compressed bitmap indexes from the FastBit software which are known to be highly effective for query-intensive workloads common to scientific data analysis. To use the indexes, we need to build them first. The index building procedure needs to access the whole data set and may also require a significant amount of compute time. In this work, we adapt the in situ processing technology to generate the indexes, thus removing the need of read- ing data from disks and to build indexes in parallel. The in situ data processing system used is ADIOS, a middleware for high-performance I/O. Our experimental results show that the indexes can improve the data access time up to 200 times depending on the fraction of data selected, and using in situ data processing system can effectively reduce the time needed to create the indexes, up to 10 times with our in situ technique when using identical parallel settings

    The First Provenance Challenge

    No full text
    The first Provenance Challenge was set up in order to provide a forum for the community to help understand the capabilities of different provenance systems and the expressiveness of their provenance representations. To this end, a Functional Magnetic Resonance Imaging workflow was defined, which participants had to either simulate or run in order to produce some provenance representation, from which a set of identified queries had to be implemented and executed. Sixteen teams responded to the challenge, and submitted their inputs. In this paper, we present the challenge workflow and queries, and summarise the participants contributions
    corecore